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The predictive validity of scores on the National 
Board of Medical Examiners (NBME) Part I and Part II examii.ations for 
the selection of residents in orthopaedic surgery was investigated. 
Use of NBME scores has been criticized because of the time lag 
between taking Part I and entering residency and because Part I 
content is not directly linked to knowledge and skills required in 
residency. NBME Part I scores were obtained for 481 of the 1,050 
examinees who took the written component of the certification 
examination of the American Board of Orthopaedic Surgery (ABOS) in 
July 1988. Scores on Fart II were available for 461 of these 
e:<aminees. Statistically significant relationships were found between 
the ABOS examination and both NBME examinations. Part II of the NBME 
was a better predictor of ABOS performance than was Part I of the 
NBME. This study supports the belief that those who have done well on 
examinations continue to do well, possibly because of good 
test-taking skills. While the NBME examination should not be the sole 
determinant of acceptance in a residency program, the degree of 
correlation suggestr, that intelligent use of these scores provides an 
efficient and effective way to screen large numbers of applicants. 
One figure illustrates the relationship between failure on the ABOS 
and NBME scores. There is a 17-item list of references. (SLD) 
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The purpose of this study was to bvestigate the predictive validity of NBME scores for the selection of residents 
in Ci tHopaedic Surgery. 



in selecting the best applicants for their programs, residency program directors face a considerable challenge. 
Many programs have hundreds of applicants for each a^'ailable position, and it is common practice to use scores 
on the National Board of Medical Examiners (NBME) Part I and n examinations to identify applicants for 
further consideration (Nungester, 1990; Wagoner, 1986; McCoUister, 1988). This use of NBME scores has been 
widely criticized, particularly for Part I, because of the time lag between taking Part I and entering resideicy, 
and because the content of Part I is not directly linked to the knowledge and skills required in residency. The 
NBME has also criticized this use of NBME scores because the examinations are not designed for this purpose 



A first step in assessing the appropriateness of this use of NBME scores is to determine the strength of the 
relationship between the predictor vari'ibles (NBME scores) and the criterion (some measut.* of success in 
residency). Typical studies have used as criterion variables cither in-training exams with small sample sizes from 
a single program, showing inconsistent results (Spellacy, 1985; Warrick, 1986, Catalano, 1989); or ratings of 
resident performance, showing consistently low positive relationships (Keck, 1979; Markert, 1989; George, 1989; 
Vmdra, 1988; Turner, 1987; Veloski, 1990; DisUehorst, 1988; WilUams, 1987, Gunzberger, 1987; Wood, 1990). 

In contrast .o previous research, this study used scores on a professionally developed specialty board certification 
examination :i5 the criterion measure. While some believe that a relationship between NBME scores and other 
multiple ciioice tests would reflect only method variance, the use of certification exams as a criterion has some 
basis in practicality. Program directors clearly want to accept applicants who will succeed in their programs, and 
one measure of success in residency is subsequent performance on the specialty board certification exam. 



Subjects. A total of 481 of the 1050 examinees who took the written component of the certification examination 
of the American Board of Orthopaedic Surgery (ABOS) m July 1988 was identified in the NBME data base by 
self-reported social security numbers. These udividuals took one or both of the NBME Part I or n 
examinations. NBME scores were obtained for 481 examinees on Part I and 461 examinees on Part II. 

Instrumentation. The ABOS examination was administered in a single site under secure conditions. The six- 
hour examination include<^ 274 multiple-choice questions (MCQs). The examination assessed application of 
knowledge through use of clinical vignettes combined with 100 radiographs, histosections, or other pictorial 
material that required exanounees to interpret the information and formulate a diagnosis or a management plan. 

The NBME Part I exam contained approximately 980 MCQs, covering the bauc bio-medical sciences of anatomy, 
behavioral lidences, biochemistry, miaobiology, pathology, pharmacology, and physiology in approximately equal 
proportion .. The NBME Part U exam contamed approximately 900 questions covering the clinical sciences of 
internal mi-didne, obstetrics and gynecology, pediatrics, preventive medicine and public health, psychiatry, and 
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(VoUe, 1988). 
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turgcry b approninMely equal Foportioni. Part I is typically Uken in the second year of medical school: Part 
u IS typically taken in the fourth year. 
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CharwIeristJcs of the Sample. Because only 481 of the 1050 ABOS examinees could be identified in the NBME 
data base, the representaUveness of the sample was investigated by comparing ABOS exam performance of the 
sample; the ABOS reference group (ie, graduates of LCME accredited schooU taking the exam for the first 
tune); and the ABOS total group. The mean percent correct score of the sample (X-73; SD-5) was slighUy 
below that of the reference group pc-75; SD-5), and slightly above that of the total group pc-70; SD-5). 

Mormancc on NBME Exaras. Mean scores for the sample were 514 on Part I (SD-92) and 494 on Part n 
(SD-91). Mean scores for those who passed the ABOS exam (n-429) were approximately 1 SD higher than 
for those who faUed the ABOS exam (n-52) on the NBME total scores and aU subscores, with the excepUon 
of behavioral science, indiere the difference was only 22 pcMnts. 

Corr^Uons between scores. Statistically significant relaUonships were found between the ABOS exam and all 
NBME Part exams. The ABOS exam score had an observed correlation of 0.49 with NBME Part I and OSS with 
Part n. The strongest correlations with Part I subtests were with physiology, anatomy, and biochemistry ( 48, 
.44, and .41, respecUvely); somewhat weaker reUUonships were found with pharmacology, microbiology, and 
pathology (39, 39, and 37, respectively); the weakest correUtion was with behavioral science (r- 18) ' The 
strongest correlaUons with Part D subtests were with medicine and surgeiy (JO and .48, respectively^ foUowed 
by pediatrics and obsletrics/gynecology (.44 and .42, respectively); the weakest relaUonships ^ ^re with prevenUvc 
medicme and public health and with psychiatry (37 and 36, respectively). 

Predictions of ABOS scores. Regression analyses indicated that Part D was a better predictor of performance 
cn the ABOS exam than Part I (R' of 030 vs 0.23). Using Part I subscores as predictors was only slighUy better 
than usujg the total Part I score (adjusted R» of 0^ vs 023); using subscores for Part D did not result in a 
higher R than using the total score. Using all Part I and D subscores was slighUy better Uian usina boUi total 
scores (adjusted R* of 033 vs 031). 

Figure 1 shows Uie likelihood of failure on Uie ABOS exam as a funcUon of perf^ormance on Oie Part I and n 
exammations. For example, of \bt 19 examinees who scored below 350 on Uie NBME Part I exam ll (58%) 
faUed Uie ABOS exam; of Uie 25 whc scored between 350 and 400, 9 (36%) faUed Uie ABOS exam. Similar 
results were found for Part n. The s( andard errors associated wiUi faUure rates are relatively iarge for scores 
under 350 (le, approximately 10), but deaease in Uie remaining secdons of Uie curve. 



Educational or Scientific Importance of the Study 

This study supports Uie belief Uiat Uiosc who have done weU on exams in Uie past continue to do weU on exams, 
but Uiere are at least Uircc potenUal explanaUons for Uiis phenomenon Uiat merit discmsion. The first 
explanation, which is endorsed by critics of MCQ exams, is Uiat Uiese correlaUons are largely a reflection of test- 
taking skills, not knowledge. This argument is not as compelling for certificaUon exams taken by physicians as 
It IS for tests m elementary and high school. These examinees have demonstrated Uieir ability to take tests, and 
these tests are more carefully crafted Uian standard classroom tests; the item flaws Uiat reward tcstwiseness are 
virtually non-existent in these certificaUor exams. 

The second potenUal explanaUon for Uie strong relaUonships found in Uiis study is Uiat performance on Uiese 
tests mdirecUy reflects general ability, moUvaUon, study skills, and oUier general trails Uiat influence learning. 
Past achievement may be a good predictor of future achievement because of Uiis indirect assessment. 
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Figure 1. The relationship between failure rates on the ABOS exam and NBME Part I and Part D 
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